Guest Lecture for MIT 18.5096 Topics in Mathematics with Applications in Finance
Jonathan Larkin
October 2, 2025
Disclaimer
This presentation is for informational purposes only and reflects my personal views and interests. It does not constitute investment advice and is not representative of any current or former employer. The information presented is based on publicly available sources. References to specific firms are for illustrative purposes only and do not imply endorsement.
About Me
Managing Director at Columbia Investment Management Co., LLC, generalist allocator, Data Science and Research lead. Formerly CIO at Quantopian, Global Head of Equities and Millennium Management LLC, and Co-Head of Equity Derivatives Trading at JPMorgan.
The Condorcet Jury Theorem states that if each member of a jury has a probability greater than 1/2 of making the correct decision, then as the number of jurors increases, the probability that the majority decision is correct approaches 1.
\[
P(\text{majority correct}) \to 1 \text{ as } n \to \infty \\
\iff \text{independence of errors}
\]
e.g., sklearn.ensemble.VotingClassifier relies on this result.
Boosting Weak Learners (1988)
Kearns, Michael. Thoughts on Hypothesis Boosting. 1988.
Friedman, Jerome H. Greedy function approximation: A gradient boosting machine. 2001.
Sequentially train many “weak learner” models, each focusing on the errors of the previous ones.
Gradient boosted decision trees are the dominant approach in tabular machine learning still today.
\(F_M\) is the ensemble model. After M rounds: \[
F_M(x) = F_0(x) + \sum_{m=1}^M \gamma\, h_m(x)
\]
Each round fits \(h_m\) to the negative gradient of the loss at \(F_{m-1}\), then updates: \[
F_m(x) = F_{m-1}(x) + \gamma\, h_m(x)
\]
\(\gamma\) is the learning rate; \(h_m\) is a weak learner (e.g., shallow tree).
Model Stacking (1992)
Wolpert, David H. Stacked Generalization. 1992.
Train “meta-model” on the predictions of independent base models.
Works best when base models are diverse and capture different aspects of the data.
e.g., sklearn.ensemble.StackingClassifier
Stacking in a Nutshell
Combine several different models by training a meta-model on their predictions.
Train M independent base models \((f_1, \dots, f_M)\) (e.g., linear model, tree, neural net, etc.).
Using an appropriate cross validation scheme, collect out-of-fold predictions for each training example to avoid leakage.
Train a meta-model \((g)\) on these predictions (optionally with the original features). \[
\hat{y}(x) = g\!\big(f_1(x),\, f_2(x),\, \dots,\, f_M(x)\big)
\]
Ensemble Methods Summary
Voting: combine models, majority vote.
Boosting: sequentially build models, each correcting the previous.
Stacking: combine diverse models, leveraging their strengths.
Model Averaging is a special case of stacking: the meta-model is a weighted linear sum.
Surowiecki, James. The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies, and Nations. Doubleday, 2004.
For the crowd to be smarter than experts, we require
Diversity of opinion → different perspectives reduce blind spots
Independence of members → avoid groupthink
Decentralization → empower local knowledge
Aggregation of information → combine insights effectively
The Common Task Framework (2007-)
Donoho, D. (2017). “50 Years of Data Science.” Journal of Computational and Graphical Statistics, 26(4), 745–766.
Define a clear task (e.g., image recognition).
Provide dataset + ground truth labels + hidden test set.
Fundamental research from SumZero (analyst reports, ratings)
Market + crypto data, engineered features
Community:
Analysts generate qualitative ideas
Data scientists build quantitative models
Company: runs competitions, integrates data, manages portfolio construction & execution
CrowdCent – Role of Crypto
No native token (as of 2025)
Crypto as an asset class: community builds crypto strategies (Hyperliquid challenge)
Integration with crypto projects:
Fund that stakes in Numerai’s NMR ecosystem
Bridges decentralized funds together
Less “on-chain” than Yiedl, but embraces crypto markets and ethos
Conclusion
Numerai: pioneered crypto-incentivized crowdsourced hedge fund
Yiedl: DAO-based, on-chain hedge fund built entirely with DeFi
CrowdCent: blends human fundamental research with ML and crypto strategies
Common threads:
Community-driven intelligence
Machine learning aggregation
Cryptocurrency as incentive + infrastructure
Future hedge funds: open, decentralized, global
Human + Machine
Types of Collaboration
Horizontal
Vertical
Horizontal
Human forecasts concatenated with machine forecasts
Fit model on both
Vertical
Stepwise: human first, machine second (or vice versa)
Motivation
Canonical view: “Public” info is free and instantly priced (EMH).
Reality: Converting public data → stock picks is costly (analysts, data prep, modeling, infra, time).
Research question: How large are the economic costs of processing public information?
Study Design (High Level)
Population: 3,337 active, diversified U.S. equity mutual funds (1990–2020).
Build an AI analyst that uses only public data and realistic constraints.
Compare:
Human manager (actual holdings)
AI-modified (hybrid; selective replacements)
AI-only (full replacement within style constraints)
Measure: Dollar alpha and Sharpe; treat forgone gains as lower bound of managers’ marginal info-processing costs.
Key Findings (Punchline)
AI-modified: +$17.1M incremental per quarter vs. human; ~93% of managers outperformed over their lifetimes.
AI-only: +$17.2M incremental per quarter; ~42% allocated to style indices without hurting results.
Sharpe improves materially (details ahead).
Results robust to risk models, transaction costs, benchmark identification, and incentive heterogeneity.
Conceptual Framework
Investors process info until marginal cost = marginal gain.
If AI (public info only) can improve a manager’s portfolio under the manager’s constraints, the forgone gains represent a lower-bound on the manager’s marginal info costs.
Human + machine is evaluated within fund style/risk/size/liquidity constraints.
Portfolio construction and evaluation are within DGTW groups to preserve style.
ML Pipeline (Pseudocode)
# =========================# FEATURES & LABELS# =========================for each month t in 1980..2020: X_t := features available by end of t-1 (market, accounting (lagged), IBES, EDGAR text, macro, ratings) y_t := realized stock return over t+1..t+3 (DGTW-adjusted preferred)# Preprocess within training folds only:# - Winsorize features (type-specific rules)# - Impute missing (numeric: quarter-mean; categorical/flags: 0)# - Standardize numeric using train stats# =========================# ROLLING EXPANDING TRAINING# =========================for prediction year Y in 1986..2020: train_period := 1980-01 .. (Y-1)-10 # end in Oct Y-1 to avoid quarter overlap # time-series cross-validated randomized hyperparameter search (RF depth, trees, min split/leaf, mtry) fit RandomForest_Y on train_period to minimize validation MSE # ========================= # PREDICT MONTHLY IN YEAR Y # ========================= for each month t in Y: ŷ_t := RandomForest_Y.predict(X_t) # also compute within-DGTW ranks/deciles for portfolio rules# Optionally also fit a Neural Network model; ensemble via average rank to get a small lift.
What Drives Predictions?
Permutation importance analysis shows simple features are highly influential:
Market value, dollar volume, trading activity, earnings-forecast signals.
RF captures nonlinear interactions among simple predictors.
Portfolio Construction – Shared Constraints
No shorting; quarterly rebalance only.
Within-style swaps: replacements must come from the same DGTW group.
Depth/liquidity: cap any single holding to ≤ 20% of the stock’s market cap (clip and keep overflow).
No duplicate replacements (“without replacement” within a fund/quarter).
Payout convention: AI’s incremental gain is paid out quarterly so human and AI start next quarter with equal AUM (conservative for AI in dollars).
AI-Modified (Hybrid) – Pseudocode
inputs: w_h[j] # human start-of-quarter weight for stock j g[j] # DGTW group of stock j decile[j] # predicted decile within g[j] (1=worst ... 10=best) ŷ[j] # predicted DGTW-adjusted return NAV # fund net asset value at start of quarterinitialize: w_ai := w_h used := ∅ # prevent duplicate use of replacement names# Keep strong human picksfor j in holdings sorted by descending w_h[j]: if decile[j] == 10: used.add(j)# Attempt upgrades for others (largest positions first)for j in holdings sorted by descending w_h[j]: if decile[j] in 1..9: group := g[j] C := { s in group | decile[s] == 10 and s ∉ used } if C ≠ ∅: # choose best candidate k := argmax_s∈C ŷ[s] target_value := w_h[j] * NAV max_value := 0.20 * market_cap(k) delta_value := min(target_value, max_value) w_ai[k] += delta_value / NAV w_ai[j] -= delta_value / NAV used.add(k)# Replace remaining bottom-decile names with the group indexfor j in holdings: if decile[j] == 1 and w_ai[j] > 0: idx := index_for_group(g[j]) w_ai[idx] += w_ai[j] w_ai[j] = 0# Normalize / cleanproject w_ai onto the simplex (weights ≥ 0, sum = 1)
AI-Only (Full Replacement) – Pseudocode
inputs as aboveinitialize: w_ai := 0 used := ∅ backlog[group] := 0 for all groups# Map each human slot to a top-decile name in the same groupfor j in human holdings sorted by descending w_h[j]: group := g[j] C := { s in group | decile[s] == 10 and s ∉ used } if C ≠ ∅: k := argmax_s∈C ŷ[s] target_value := w_h[j] * NAV max_value := 0.20 * market_cap(k) delta_value := min(target_value, max_value) w_ai[k] += delta_value / NAV used.add(k) # any leftover because of cap stays to be assigned: if delta_value < target_value: backlog[group] += (target_value - delta_value) / NAV else: backlog[group] += w_h[j]# Push any leftover weight to the group indexfor each group: if backlog[group] > 0: idx := index_for_group(group) w_ai[idx] += backlog[group]project w_ai onto the simplex
Performance Measurement – Dollar Alpha
Let Rᵐᵢ,𝑞 be fund i’s gross return from observed holdings in quarter q.
Let Rᵇᵢ,𝑞 be the corresponding DGTW benchmark return (same weights, matched groups).
Dollar alpha (human):
Vᵢ,𝑞 = Assetsᵢ,𝑞−1 (Rᵐᵢ,𝑞 − Rᵇᵢ,𝑞)
Incremental dollars (AI over human):
Zᵢ,𝑞 = Assetsᵢ,𝑞−1 (Rᴬᴵᵢ,𝑞 − Rᵐᵢ,𝑞)
Aggregate to lifetime averages per fund, then time-weighted across funds.
Sharpe Ratio (Definition & Use)
For fund i with quarterly excess returns ERᵢ,𝑡 = Rᵢ,𝑡 − Rʳᶠₜ: